Thermostable chaperone-based polypeptide biosynthesis: Enfuvirtide model product quality and protocol-related impurities

Large peptide biosynthesis is a valuable alternative to conventional chemical synthesis. Enfuvirtide, the largest therapeutic peptide used in HIV infection treatment, was synthesized in our thermostable chaperone-based peptide biosynthesis system and evaluated for peptide quality as well as the profile of process-related impurities. Host cell proteins (HCPs) and BrCN cleavage-modified peptides were evaluated by LC-MS in intermediate. Cleavage modifications during the reaction were assessed after LC-MS maps were aligned by simple in-house algorithm and formylation/oxidation levels were estimated. Circular dichroism spectra of the obtained enfuvirtide were compared to the those of the chemically- synthesized standard product. Final-product endotoxin and HCPs content were assessed resulting 1.06 EU/mg and 5.58 ppm respectively. Peptide therapeutic activity was measured using the MT-4 cells HIV infection-inhibition model. The biosynthetic peptide IC50 was 0.0453 μM while the standard one had 0.0180 μM. Non-acylated C-terminus was proposed as a cause of IC50 and CD spectra difference. Otherwise, the peptide has met all the requirements of the original chemically synthesized enfuvirtide in the cell-culture and in vivo experiments.


Introduction
Peptides are a rapidly growing therapeutic group with more than 80 different peptides on the market [1]. We have developed a straightforward biosynthesis process for conventional ribosome-synthesized peptides. In previous articles we showed that it can be a useful and robust tool for peptide synthesis [2][3][4]. To prove its usability for pharmaceutic development of active peptide substances there are some points that must be examined.
The usual process of active substance development includes target validation, compound screening, secondary assays and in vivo analysis [5].
Starting from the screening, the method of compound synthesis matters. We expect that peptide biosynthesis is fully compatible with phage display technology [6]: its product is an encoded linear amino acid sequence. So, theoretically, most of the obtained peptide sequences can be readily transferred for biosynthesis in our system. The next two steps usually include cell culture experiments and in vivo analysis. Therefore, the final product should meet the following criteria for material for cell culture/in vivo experiment: • pH/osmolarity compatible • sterile/viral free • apyrogenic • free of toxic impurities The pH/osmolarity can be corrected, and particle absence can be ensured during sample preparation, while sterility, the presence of endotoxins and toxic impurity content depend on the material's origin and purification process.
Protein and peptide drugs are thermolabile, and the most common sterilization technique for such samples is filtration. Usually, it's enough to prevent microbial growth in a sample, and one can assume that the sample is sterile. However, it should be considered otherwise. Mycoplasmas are among the smallest self-replicating organisms [7]. Viruses are even smaller and some viruses are small enough even on HPLC column pore size scale, for example porcine circovirus type 1 is about 17 nm in diameter [8]. Therefore, sterilization by filtration is effective only for mycoplasma and virus-free samples.
Similar to chemical synthesis, the E. coli biosynthesis is not a source of mycoplasma or eukaryotic viruses. It is debatable, what process is easier to perform in an aseptic mannerchemical synthesis or biosynthesis with their purification steps.
The process impurities profiles of chemically synthesized and biotechnologically derived peptides are different. Chemically synthesized peptides potentially contain products of fragmentation, deletion, β-elimination, racemization, not fully deprotected products [9].
In this study, we can expect the following host cell soluble impurities co-purified with our protein: host proteins, host cell DNA, lipids (only endotoxins as a special case otf lipids will be regarded), and cleavage-induced impurities.
The host cell proteins (HCPs) level is regulatory limited to 100 ng/mg. Also, HCPs are a heterogeneous group. The most dangerous agents among them are physiologically or enzymatically active proteins, such as bacterial flagellins [10] or proteases [11]. We had the opportunity to explore the co-purifying fraction of HCPs in intermediate product by HPLC-MS to access potential biosynthesis system-related hazards directly.
The practical significance of E. coli residual DNA limit in pharmaceuticals is unclear [12,13]. Immunogenicity concerns are raised [10]. However, practical study design usually does not exclude such prominent factors [14] as aggregates, endotoxins and HCPs. We decided to skip residual DNA test as the less concerning impurity for in vitro/in vivo tests.
Unlike the chemical synthesis, the E. coli is a source of endotoxin and its content in peptide must be justified prior any cell culture or in vivo usage.
Endotoxins are lipopolysaccharides (LPS) of gram-negative bacteria and strong activators of innate immunity [15]. Endotoxins cause intense system reactions when administered parenterally with a drug formulation and can affect cell cultures [16], so endotoxin levels are regulatory limited.
The cyanogen bromide cleavage process clearly produces protein modification. Also, this method limits the amino acid composition and puts the homoserine lactone on the C-termini of the released peptide. This mode of cleavage was chosen because of the small specific site footprint, cheapness, and robustness. This method is not obligatory, the mode of subsequent treatment can be adapted for an individual proposed peptide with the construct adapted accordingly. The current hydrolysis protocol is expected to be the source of formylated and oxidized forms and C-terminal homoserine/homoserine lactone [17].
The crucial point of any active substance synthesis is the substance activity itself. The enfuvirtide acts as an antiretroviral fusion inhibition drug. MT-4 cells were used as an established in vitro model for anti-HIV drugs [18].
To investigate the difference between the enfuvirtide standard and our sample activities, the CD spectrum of the standard was obtained and compared with the sample CD spectrum.

Materials and methods
Modified GroEL-Enfuvirtide fusion was produced as described previously [3] The protein with the designed sequence (S1 File) was expressed in E. coli cell culture in soluble form, purified by heat-induced host protein denaturation and IEX chromatography, desalted, lyophilized, and stored at -20˚C.
Cyanogen bromide reagent grade from Sigma, USA, buffer components and SDS-PAGE reagents "for biochemistry" grade, by Amresco, USA; Pierce Unstained Protein MW Marker by Thermo Fisher Scientific, USA; formic acid for biochemistry by AppliChem, USA and MSgrade solvents by Merck, Germany were used.

The hydrolysis process
Modified GroEL-Enfuvirtide fusion was hydrolyzed at 5 mg/mL concentration in 63.6% formic acid with 0.46 M cyanogen bromide and 4.6% acetonitrile. The reaction mixture was aliquoted and the process was performed at 20˚C in a dark place. Each hour, corresponding aliquots were quenched by x10 water dilution and immediate freezing in liquid nitrogen. Aliquots were dried in Alpha 3, 4 LSCbasic (Martin Christ, Germany) freeze-dryer at 0.05 mBar pressure and -110˚C condenser temperature.
Alternative protocols test. Gua 6M with 0.1 M HCl, 0.46 M cyanogen bromide and 4.6% acetonitrile and TFA instead of FA in original protocol were compared with original protocolat points 1 h and overnight (16 h). Initial protein concentration was equal and SDS-PAGE samples were prepared with adjustment by initial substrate concentration.

HCPs search
Tryptic digest was performed in solution with Trypsin Gold (Promega, USA) according to its user manual. Purified fusion protein was studied as an intermediate and purified peptide as the final product.
Modifications search samples were prepared as previously described for HPLC-MS/MS analysis.

Peptide matching script
Resulting "expected and found" and "unidentified" peptide lists for all 1-7 h samples were combined. The reference peaks were chosen: the main product and its dimer, 3 different formylated and one oxidized form of the main product. Their molecular weights (MWs) and retention times (Rts) were normalized to theoretical MWs and mean Rts. Re-calibration between samples was performed using the mean of normalized MW and Rt of every sample. Normalized MW and Rt after recalibration were used for SD estimation. MW and Rt were scaled by estimated SD. The pairwise euclidean distance in MW and Rt 2D space was calculated and 3SD was used as a threshold for peak matching. The resulting adjacent matrix was converted into a list of connected graphs. Graphs were additionally matched by MW due to different Rt of the same modification on different AAs. The "expected" part contained multiple proposed peak options as individual peaks with different theoretical MW for every proteoform, the nearest by MW was chosen. In the whole graph, peptides were attributed by nearest to mean graph MW proposed peak option from "expected". If there were no proposed peak at a distance of 3SD, automatic attribution by BioCompass was considered incorrect.
The modification intensity of the sample was calculated as the sum of the modified peak intensity multiplied by its modification count. (Like: double formyl group gives double formyl intensity).

Circular dichroism comparison
Spectra were collected by Chirascan circular dichroism spectrometer (Applied Photophysics, UK) as described previously. The sample was prepared at 2.5 mg/mL concentration in a 30 mM sodium phosphate buffer.
To exclude the impact of formic acid treatment on the secondary structure, following control sample was prepared: the reference sample was dissolved in a hydrolysis mixture, quenched and freeze-dried, dissolved in mobile phase 0.1% (v/v) formic acid in 30% acetonitrile, and dried overnight on rotary vacuum concentrator RVC 2-25 CDplus (Martin Christ, Germany) at 30˚C.
Viruses. The viral stock of the laboratory strain HIV-1 IIIB (NIH AIDS Reagent Program) was obtained during acute infection of MT-4 cells. The virus was stored in aliquots at -80˚C. Antiviral assay. The antiviral activity of the compounds against the HIV-1 IIIB strain in MT-4 cells was evaluated using a tetrazolium-based colorimetric assay as described recently [19]. Briefly, this method is based on HIV-induced cytopathic effect (CPE) in MT-4 cells 5 days post infection. The antiviral effects of the test compounds were directly correlated with the inhibition of virus-induced CPE by measuring cell viability using the MTT assay. MT-4 cells (6 x 105 cells/ml) were infected with the IIIB virus strain at 100 CCID 50 (50% cell culture infectious dose) in the presence of different compound dilutions. Protection against HIVinduced CPE was assessed using the MTT assay 5 days post-challenge. IC 50 and its 95% confidence interval (CI) estimation were performed using GraphPad prism software (8.0.1 Build 244) and scipy 1.7.3 and uncertainties 3.1.7 python packages. GraphPad function Y = Bottom + (Top-Bottom)/(1+10^((LogIC50-X)*HillSlope)) was used for curve fitting with scipy.optimize.curve_fit. 83% CI was calculated as 1.39 of standard deviation and used of estimation of 95% probability for standard and sample CIs overlap [20].

Other methods
SDS-PAGE in tris-glycine buffer and 10-20% gradient gel was performed as previously described [3].
Endotoxin assay was performed according to harmonized EP/USP "Bacterial endotoxin" monograph. The maximal valid dilution (MVD) calculation was performed with the assumption of 2 mg per kg subcutaneous administration with resulting 2,5 EU/mg. Controls were set accordingly "Confirmation of Labeled Lysate Sensitivity" and "Test for Interfering Factors".
Kinetic turbidimetric method was used with a ½ MVD sample concentration. HCPs assay kit «E.coli Host Cell Proteins» (Cygnus Tecnologies, USA) was used according to its manual.

Results
Comparison between guanidine with hydrochloric acid, formic acid and trifluoroacetic acid as media for cleavage showed significant sample loss for guanidine sample and fusion protein fragmentation for trifluoroacetic acid (Fig 1).
The SDS-PAGE of analytical 0-5 h hydrolysis in formic acid was previously published with no visible improvement in process yield after the first hour [3].
HCPs found by LC-MS in the intermediate GroEL-Enfuvirtide product before cyanogen bromide hydrolysis and RP-HPLC purification.
All potentially detected HCPs are listed in S1 File. Host proteins detected with high probability (with a MASCOT score above 90) [21] are presented in Table 1 below.

PLOS ONE
HCP level in final product was 5.58 ppm and bacterial endotoxin level less than 1.06 EU/ mg was measured.
After 1-7 h of BrCN hydrolysis peptide concentration and modifications level were assessed by HPLC-MS with subsequent peptide search and peak matching.
The peak matching process revealed an increased distribution density of pairwise distances in the region 0-3 SD (Fig 2).
Formylation proceeds with new multi-formylated forms appearing after second hour and S31 is the most vulnerable AA. Oxidation and hydrolysis of C-terminal homoserine lactone to homoserine are much slower (Fig 3A). The main product concentration declined over time (Fig 3B).
CD spectra of the reference peptide and synthesized sample with additional C-terminal methionine differ greatly (Fig 4). Maximal similarity between sample and standard is observed between CD spectra of sample at 65˚C and standard at 20˚C. Circular dichroism data and visualization script, mass-spectrometry BioCompass search results and previously described script, NIR results, processing and visualization script, HCP and IC50 test protocols are included in S2 File.

Discussion
Most of the listed HCPs are abundant E. coli cell proteins and well-known downstream impurities.

PLOS ONE
Thermostable chaperone-based polypeptide biosynthesis: Product quality and protocol-related impurities

PLOS ONE
Thermostable chaperone-based polypeptide biosynthesis: Product quality and protocol-related impurities Among the established high-risk HCPs [22] are the following: • Enolase (drug modification attributed, also the reference publication has other protein as a root of the problem [23])

PLOS ONE
Thermostable chaperone-based polypeptide biosynthesis: Product quality and protocol-related impurities • FKBP-type peptidyl-prolyl cis-trans isomerase SlyD (Peptidyl-prolyl cis-trans isomerase A) has mentioned as an aggregation factor.
Overall, the absence of proteases is a good feature. However, peptidyl-prolyl cis-trans isomerase with its ability to change the proper conformation of some proline-containing peptides and DnaK with its immunogenicity issues are significant factors to consider during subsequent hydrolysis and purification process [22][23][24][25]. In this study most of proteins will be cleaved on M sites, so no enzymatic activity is expected. BrCN cleavage usually yields large peptides and immunogenicity is still a concern for non-purified product.
Endotoxin concentration less than 1.06 EU/mg was measured. It is lower, than 2.5 EU/mg limit for high-dosage (90 mg per dose) enfuvirtide drug.
Endotoxins interact with proteins and are readily co-purified during downstream process [26]. These interactions are expected to have mixed electrostatic and hydrophobic nature with Ca2+ ions involved [27]. Relying on the final RP-HPLC step, we did not pursuit the LSP removal during intermediate GroEL-enfuvirtide purification. So, if peptide biosynthesis system re-engineering revises the final RP-HPLC step-some LPS removal measures should be added.
The ordinary ion-exchange chromatography step we used cannot provide significant LPS removal [28] due to its hydrophobic interactions with purified protein.
Apparently, detergent can disrupt hydrophobic interactions between protein and LPS while separation is performed in different mode-ion exchange like in this study, or any other compatible generally. We suppose including an additional step of non-ionic detergent washing [29] in that case.
Protein formylation in formic acid solutions up to 0,1% dilution is well-known [30]. Nonetheless, formic acid usage is practically unavoidable in analysis and, sometimes, in preparative processes.
In this study, for cyanogen bromide cleavage, we had to stay on the formic acid option instead of guanidine or trifluoroacetic acid. We assume that formation of new low molecular bands with TFA or "shadow" with guanidine and HCl on SDS-PAGE gel can be result of protein fragmentation by non-specific hydrolysis as well as low recovery of hydrolysis products.
We re-evaluated formylated product content and added new non-matched automatically peaks.
We used our own framework for peptide matching with the results of Bruker software. The raw data were processed by BioCompass successfully, with a resulting "expected" and "unexpected" peptide lists. Some peaks were listed in "expected" multiple times with different theoretical proteoforms proposed. Also, BioCompass software has a peculiarity: after the first discovery of a proteoform on the LC profile, the software ceases the search of that proteoform. Therefore, we must choose the proper proteoform and validate the assignment, to search its isomers on the LC profile and match all peptides between the samples.
Specifically, we encountered a task, known as LC-MS feature map alignment. There are many LC-MS alignment algorithms for raw map alignment or feature map alignment with various software implementations [31,32]. The Jupyter Notebook is the de facto standard for data exploration [33]. The interactive nature of Jupyter and the flexibility of the Python language offer a great advantage for data processing and presentation. There are some sophisticated Jupyter-ready pipelines: AlphaPept [34], BioDendro [35], TidyMS [36], designed for raw data processing. However, their entry threshold is steep. So we created a minimalistic tool for processed data, meeting our needs.
The resulting experimental peptide mass (or Rt) fluctuates, depending on several factors. Some of these are unique for a given peptide peak and some of them-for the whole LC run. We have been able to compensate run-dependent variability, resulting in 1.5x SD reductionin fact, we recalibrated Rt and mass measurement for every sample, using hand-picked set of different reference peaks.
We evaluated peak-dependent dispersion after recalibration. The peak probability distribution in this study was close to normal so we used the empirical rule to estimate the statistically probable difference between the same peak in different samples after recalibration. After SD scaling, we used Euclidean distance as the symmetric metric of the Rt AND mass difference.
The histogram of distance distribution shows that run-dependent variability was compensated sub-optimally with systematic error. Mostly, this error is provided by mass calibration shift on dimeric enfuvirtide high-mass reference peak. This effect can be avoided by reference peak sample size increase and usage of linear function for calibration correction instead of a single coefficient.
Our algorithm is close to the gross-alignment in XAlign [37], but we use mean Rt instead of median Rt for Rt recalibration and added mass recalibration by theoretical mass of manually validated assigned peptides. The empirical rule in our method plays the same role as twodimensional Kolmgorov-Smirnov (K-S) test in XAlign.
After initial peak matching mass measurement was refined and peptide isomers were matched by mass only.
As we can see, anybody who uses the standard overnight BrCN protocol has his product heavily modified. Modification rates on the 1-7 h interval are close to linear, so hydrolysis time optimization is effective way to facilitate further purification and increase yield. Hydrolysis time optimization on model peptide is not justified: susceptibility to BrCN hydrolysis differs for different M-X pairs and, presumably for different sequences [38].
The formyl modification occurs naturally-N-formyl methionine peptides are involved in chemotaxis and inflammation [39] and lysine formylation of histones has epigenetical function [40]. Definitely, formylation impacts peptide charge and conformation, thus affecting the activity and should be avoided.
Formylation is temperature-dependent [41], like any other chemical reaction and so the BrCN cleavage, and there is no obvious point in low-temperature hydrolysis.
Oxidation depends on BrCN reagent quality [17], in our study we used the properly stored, but not the fresh opened one.
Circular dichroism data show that reference enfuvirtide has less α-helical structure, than biosynthesized. Possible explanation may include involvement of free N-terminus in secondary structure formation (DichroWeb structures approximations in S1 File). Additionally, structural differences were registered by NIR spectra [42] (method in S1 File).
The revealed activity difference between standard and bio-synthesized enfuvirtide has its roots in their chemical or/and conformational nonidentity.
Firstly, standard enfuvirtide has its N-and C-termini endcapped by acetylation and amidation. The bio-synthesized one has a free N-terminus and an additional homoserine lactone on C-terminus. The latter has much lower polarity than the free C-terminus and in that way, it is comparable with amidated C-terminus. Also, they differ sterically. Free tyrosine N-terminus is basic and it will be charged under physiological conditions.
Wild and colleagues presented α-helical peptides as anti-HIV agents and suggested endcapping because such peptides are derived from the longest chain "The N terminus of the peptide was acetylated and the C terminus amidated to reduce unnatural charge effects at those positions" [43]. Therefore, most ever published antiretroviral α-helical peptides have N-and C-termini blocked [44][45][46][47].
Zhang and colleagues described the model complex of gp41-derived peptide N39 and enfuvirtide: ". . .the first residue Tyr-127 not only interacts with His-53 on N39 by a hydrogen bond but also contacts with Leu-54 by hydrophobic force. . ." [49, p3] and ". . .Trp-155 and Ala-156 mediate hydrophobic contacts with Leu-26, which locate at the TRM (tryptophanrich motif) of T20 and the FPPR (fusion peptide proximal region) of N39, respectively. . ." [49, p3]. Also, there is a data, that C-terminus of enfuvirtide is interacting with the cell membrane while blocking cell and viral membrane fusion and hydrophobic addition such as octyl-residue on C-terminus will boost the peptide activity [50]. So, the absence of charged N and C-termini is crucial for maximal affinity.
Therefore, the activity difference is most likely explained by the charged non-acetylated Nterminus of biosynthesized sample and less likely by its sterically different homoserine lactone on C-terminus.
Further investigation of activity difference may include biosynthesized enfuvirtide selective N-terminus acetylation. However, this procedure is a state-of-the-art of selective modification [51] and it's usage on model peptide is not justified. In silico methods for estimation of nonacetylated N-terminus and C-terminal homoserine lactone contribution on activity difference can be of some help. Yet, it will be not ground truth and such estimation will not assess the general applicability of present peptide biosynthesis method.

Conclusion
The proposed biosynthesis method yields a model peptide product with a cell-culture/in vivoready purity. Our method has several advantages: no chemical synthesis reagents and consumables needed, less toxic solvents used, can be introduced in common biotech lab, E. coli strain can be stored at -80˚C for next fermentations. It can be used for ribosomal biosynthesis of large peptides without methionine amidst the sequence as is, and with methionine after cleavage site and cleavage protocol revision.
Enfuvirtide model shows nuances in required modification introduction (like N-terminus acylation in our case) for biotechnologically-derived peptides. Necessary modifications, if any, should be considered at the outset.
Overall, the proposed biosynthesis method is suitable for a significant proportion of large (up to 50 AA) peptide biosynthesis cases. The sufficient purification step is included so method can be applied as presented.
There is a room for further upgrades of the method. The future development tracks are outlined: the native conditions His-tag introduction is underway; the system ability to accommodate peptide as large as 100 AA and alternative cleavage protocols are discussed.